SUMMA: scalable universal matrix multiplication algorithm
نویسندگان
چکیده
In this paper, we give a straight forward, highly e cient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance results on the Intel Paragon system.
منابع مشابه
Parallel Matrix Multiplication : 2 D and 3 D FLAME Working Note # 62 Martin Schatz
We describe an extension of the Scalable Universal Matrix Multiplication Algorithms (SUMMA) from 2D to 3D process grids; the underlying idea is to lower the communication volume through storing redundant copies of one or more matrices. While SUMMA was originally introduced for block-wise matrix distributions, so that most of its communication was within broadcasts, this paper focuses on element...
متن کاملA Fast Scalable Universal Matrix Multiplication Algorithm on Distributed-Memory Concurrent Computers
We present a fast and scalable matrix multiplication algorithm on distributed memory concurrent computers, whose performance is independent of data distribution on processors, and call it DIMMA1 (Distribution-Independent Matrix Multiplication Algorithm). The algorithm is based on two new ideas; it uses a modified pipelined communication scheme to overlap computation and communication effectivel...
متن کاملTask-Based Algorithm for Matrix Multiplication: A Step Towards Block-Sparse Tensor Computing
Distributed-memory matrix multiplication (MM) is a key element of algorithms in many domains (machine learning, quantum physics). Conventional algorithms for dense MM rely on regular/uniform data decomposition to ensure load balance. These traits conflict with the irregular structure (block-sparse or rank-sparse within blocks) that is increasingly relevant for fast methods in quantum physics. T...
متن کاملAlgorithmic-Based Fault Tolerance for Matrix Multiplication on Amazon EC2
Cloud computing presents a unique alternative to traditional computing approaches for many users and applications. The goals of this project were to assess the viability of the cloud for scientific computing applications, and to explore fault tolerance as a mechanism for maintaining high performance in this variable and unpredictable environment. Most previous attempts to run scientific computa...
متن کاملA new parallel matrix multiplication algorithm on distributed-memory concurrent computers
We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Independent Matrix Multiplication Algorithm), for block cyclic data distribution on distributed-memory concurrent computers. The algorithm is based on two new ideas; it uses a modi ed pipelined communication scheme to overlap computation and communication e ectively, and exploits the LCM block concept...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Concurrency - Practice and Experience
دوره 9 شماره
صفحات -
تاریخ انتشار 1997